Optimal weighted nearest neighbour classifiers
نویسندگان
چکیده
We derive an asymptotic expansion for the excess risk (regret) of a weighted nearest-neighbour classifier. This allows us to find the asymptotically optimal vector of nonnegative weights, which has a rather simple form. We show that the ratio of the regret of this classifier to that of an unweighted k-nearest neighbour classifier depends asymptotically only on the dimension d of the feature vectors, and not on the underlying populations. The improvement is greatest when d = 4, but thereafter decreases as d → ∞. The popular bagged nearest neighbour classifier can also be regarded as a weighted nearest neighbour classifier, and we show that its corresponding weights are somewhat suboptimal when d is small (in particular, worse than those of the unweighted k-nearest neighbour classifier when d = 1), but are close to optimal when d is large. Finally, we argue that improvements in the rate of convergence are possible under stronger smoothness assumptions, provided we allow negative weights. Our findings are supported by an empirical performance comparison on both simulated and real data sets.
منابع مشابه
Ensembles of nearest neighbour classifiers and serial analysis of gene expression
In this paper, we represent experimental results obtained with ensembles of nearest neighbour classifiers on the binary classification problem of cancer classification using serial analysis of gene expression (SAGE) data. Nearest neighbours are selected as classifiers since they were rarely employed in building ensembles because their predictions are stable to small perturbations of data, which...
متن کاملStabilized Nearest Neighbor Classifier and Its Statistical Properties
Stability has been of a great concern in statistics: similar statistical conclusions should be drawn based on different data sampled from the same population. In this article, we introduce a general measure of classification instability (CIS) to capture the sampling variability of the predictions made by a classification procedure. The minimax rate of CIS is established for general plug-in clas...
متن کاملMulti-hypothesis nearest-neighbor classifier based on class-conditional weighted distance metric
The performance of nearest-neighbor (NN) classifiers is known to be very sensitive to the distance metric used in classifying a query pattern, especially in scarce-prototype cases. In this paper, a classconditional weighted (CCW) distance metric related to both the class labels of the prototypes and the query patterns is proposed. Compared with the existing distance metrics, the proposed metric...
متن کاملEnsembles of Nearest Neighbours for Cancer Classification Using Gene Expression Data
It is known that an ensemble of classifiers can outperform a single best classifier if classifiers in the ensemble are sufficiently diverse (i.e., their errors are as much uncorrelated as possible) and accurate. We study ensembles of nearest neighbours for cancer classification based on gene expression data. Such ensembles have been rarely used, because the traditional ensemble methods such as ...
متن کاملProperties of bagged nearest neighbour classifiers
It is shown that bagging, a computationally intensive method, asymptotically improves the performance of nearest neighbour classifiers provided that the resample size is less than 69% of the actual sample size, in the case of with-replacement bagging, or less than 50% of the sample size, for without-replacement bagging. However, for larger sampling fractions there is no asymptotic difference be...
متن کامل